Noun Phrase Recognition by System Combination

نویسنده

  • Erik F. Tjong Kim Sang
چکیده

The performance of machine learning algorithms can be improved by combining the output of different systems. In this paper we apply this idea to the recognition of noun phrases. We generate different classifiers by using different representations of the data. By combining the results with voting techniques described in (Van Halteren et al., 1998) we manage to improve the best reported performances on standard data sets for base noun phrases and arbitrary noun phrases. 1 I n t r o d u c t i o n (Van Halteren et al., 1998) and (Brill and Wu, 1998) describe a series of successful experiments for improving the performance of part-of-speech taggers. Their results have been obtained by combining the output of different taggers with system combination techniques such as majority voting. This approach cancels errors that are made by the minority of the taggers. With the best voting technique, the combined results decrease the lowest error rate of the component taggers by as much as 19% (Van Halteren et al., 1998). The fact that combination of classifiers leads to improved performance has been reported in a large body of machine learning work. We would like to know what improvement combination techniques would cause in noun phrase recognition. For this purpose, we apply a single memorybased learning technique to data that has been represented in different ways. We compare various combination techniques on a part of the Penn Treebank and use the best method on standard data sets for base noun phrase recognition and arbitrary noun phrase recognition. 2 M e t h o d s and e x p e r i m e n t s In this section we start with a description of our task: recognizing noun phrases. After this we introduce the different data representations we use and our machine learning algorithms. We conclude with an outline of techniques for combining classifier results. 2.1 Task description Noun phrase recognition can be divided in two tasks: recognizing base noun phrases and recognizing arbit rary noun phrases. Base noun phrases (baseNPs) are noun phrases which do not contain another noun phrase. For example, the sentence In [ early trading ] in [ Hong Kong ] [ Monday ] , [ gold ] was quoted at [ $ 366.50 ] [ an ounce ] . contains six baseNPs (marked as phrases between square brackets). The phrase $ 366.50 an o u n c e is a noun phrase as well. However, it is not a baseNP since it contains two other noun phrases. Two baseNP data sets have been put forward by (Ramshaw and Marcus, 1995). The main data set consist of four sections (15-18) of the Wall Street Journal (WSJ) part of the Penn Treebank (Marcus et al., 1993) as training material and one section (20) as test material 1. The baseNPs in this data are slightly different from the ones that can be derived from the Treebank, most notably in the attachment of genitive markers. The recognition task involving arbitrary noun phrases at tempts to find both baseNPs and noun phrases that contain other noun phrases. A standard data set for this task was put forward at the CoNLL-99 workshop. It consist on the same parts of the Penn Treebank as the main baseNP data set: WSJ sections 15-18 as training data and section 20 as test data 2. The noun phrases in this data set are the same as in the Treebank and therefore the baseNPs in this data set are slightly different from the ones in the (Ramshaw and Marcus, 1995) data sets. In both tasks, performance is measured with three scores. First, with the percentage of detected noun phrases that are correct (precision). Second, with the percentage of noun phrases in the data that were found by the classifier (recall). And third, 1This (Ramshaw and Marcus, 1995) baseNP data set is available via ftp://ftp.cis.upenn.edu/pub/chunker/ 2Software for generating the data is available from http://lcg-www.uia.ac.be/conl199/npb/

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Investigating Embedded Question Reuse in Question Answering

The investigation presented in this paper is a novel method in question answering (QA) that enables a QA system to gain performance through reuse of information in the answer to one question to answer another related question. Our analysis shows that a pair of question in a general open domain QA can have embedding relation through their mentions of noun phrase expressions. We present methods f...

متن کامل

Noun Phrase Recognition with Tree Patterns

This paper offers a method for the noun phrase recognition of Hungarian natural language texts based on machine learning methods. The approach learns noun phrase tree patterns described by regular expressions from an annotated corpus. The tree patterns are completed with probability values using error statistics. The noun phrase recognition parser tries to find the best-fitting trees for a sent...

متن کامل

Memory-Based Shallow Parsing

We present memory-based learning approaches to shallow parsing and apply these to five tasks: base noun phrase identification, arbitrary base phrase recognition, clause detection, noun phrase parsing and full parsing. We use feature selection techniques and system combination methods for improving the performance of the memory-based learner. Our approach is evaluated on standard data sets and t...

متن کامل

KitAi-VAL: Textual Entailment Recognition System for NTCIR-11 RITE-VAL

Method MethodFV1 MethodFV2 MethodFV3 Macro-F1 50.95 56.37 54.65 Accuracy 58.37 57.59 57.00 CorrectAR 30.27 19.02 28.23 Two strategies Search log method (MethodFV1) * Only search log information; 47 features for SVM # of documents in each search result, # of documents retrieved with n-queries, the size of query words from t2, tfidf value in the retrieved documents, and so on. Summarization metho...

متن کامل

Do Heavy-NP Shift Phenomenon and Constituent Ordering in English Cause Sentence Processing Difficulty for EFL Learners?

Heavy-NP shift occurs when speakers prefer placing lengthy or “heavy” noun phrase direct objects in the clause-final position within a sentence rather than in the post-verbal position. Two experiments were conducted in this study, and their results suggested that having a long noun phrase affected the ordering of constituents (the noun phrase and prepositional phrase) by advanced Iranian EFL le...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000